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1. Introduction 

Cameras capturing > 1 gigapixel p], > 10,000 frames per second l^l, > 10 ranges p) and 
> 100 color channels Q demonstrate that the optical data stream entering an aperture may 
approach 1 exapixel/second. While simultaneous capture of the all optical information is not 
precluded by diffraction or quantum limits, this capacity is beyond the capability of current 
electronics. Previous compressive sampling proposals to reduce read-out bandwidth paradoxi- 
cally increase system volume |'5''6l or operating bandwidth ||7}|9). Here we use coded apertures 
to compress spatial and temporal sampling by > 4x and > lOx, respectively, without substan- 
tially increasing system volume or power We describe computational estimation of 148 high- 
speed temporal frames from a single captured frame. By combining physical layer compression 
as demonstrated here with sensor-layer compressive sampling strategies pO|[TT) , and by using 
multiscale design to parallelize read-out, one may imagine streaming full exapixel data cubes 
over practical communications channels. 

Under ambient illumination, cameras commonly capture 10-1000 pW per pixel, correspond- 
ing to 10^ photons per second per pixel. At this flux, frame rates approaching 10^ per second 
may still provide useful information. Unfortunately, frame rate generally falls well below this 
limit due to read-out electronics. The power necessary to operate an electronic focal plane is 
proportional to the rate at which pixels are read-out p2) . Current technology requires pW to nW 
per pixel, implying 1-lOOW per sq. mm of sensor area for full data optical data cube capture. 
This power requirement cascades as image data flows through pipeline of read-out, processing, 
storage, communications, and display. Given the need to maintain low focal plane temperatures, 
this power density is unsustainable. 

While interesting features persist on all resolvable spatial and temporal scales, the true in- 
formation content of natural fields is much less than the photon flux because diverse spatial, 
spectral and temporal channels contain highly correlated information |13|. Under these cir- 
cumstances, feature specific |14| and compressive |15, 16 1 multiplex measurement strategies 
developed over the past decade have been shown to maintain image information even when 
the number of digital measurement values is substantially less than the number of pixels re- 
solved. Compressive measurement for visible imaging has been implemented using spatial light 
modulators (SLM) to code pixel values incident on a single detector ||8][9). Unfortunately, this 
strategy increases, rather than decreases, operating power and bandwidth because the increased 
data load in the encoding signal is much greater than the decreased data load of the encoded 
signal. If the camera estimates F frames with pixels per frame from M measurements, then 
MN control signals enter the modulator to obtain FN pixels. The bandwidth into the compres- 
sive camera exceeds the output bandwidth needed by a conventional camera by the factor NC, 
where C is the compression ratio. Unless C < 1 /N, implying that the camera takes less than 
one measurement per frame, the control bandwidth exceeds the bandwidth necessary to fully 
read a conventional imager This problem is slightly less severe in coding strategies derived 
from "flutter shutter" p7) motion compensation strategies. The flutter shutter uses full frame 
temporal modulation to encode motion blur for inversion. Several studies have implemented 



per-pixel flutter shutter using spatial light modulators for video compression|[7] [T8][T9) . If we 
assume that these systems reconstruct at the full frame rate of the modulator, the control band- 
width is exactly equal to the bandwidth needed to read a conventional camera operating at the 
decompressed framerate. Alternatively, one may entirely avoid these problems by using parallel 
camera arrays with independent per pixel codes ||5]|6) at the cost of increasing camera volume 
and cost by a factor of M. 

Here we propose mechanical translation of a passive coded aperture for low power space- 
time compressive measurement. Coding is implemented by a chrome-on-glass binary transmis- 
sion mask in an intermediate image plane. In contrast with previous approaches, modulation of 
the image data stream by harmonic oscillation of this mask requires no code transmission or 
operating power We have previously used such masks for compressive imaging in coded aper- 
ture snapshot spectral imagers (CASSI) ||20l, which include an intermediate image plane before 
a spectrally dispersive relay optic. Here we demonstrate coded aperture compressive temporal 
imaging (CACTI). CASSI and CACTI share identical mathematical forward models. In CASSI, 
each plane in the spectral datacube is modulated by a shifted code. Dispersion through a grat- 
ing or prism shifts spectral planes after coded aperture modulation. Detection integrates the 
spectral planes, but the datacube can be recovered by isloating each spectral plane based on its 
local code structure. This process may be viewed as code division multiple access (CDMA). 
In CACTI, translation of the coded aperture during exposure means that each temporal plane 
in the video stream is modulated by a shifted version of the code, thereby attaining per-pixel 
modulation using no additional sensor bandwidth. 

Signal separation once again works by CDMA. We isolate the object's temporal channels 
from the compressed data by inverting a highly-underdetermined system of equations. By using 
an iterative reconstruction algorithm, we may estimate several high-speed video frames from a 
single coded measurement. 

2. Theory 

One may view CACTI's CDMA sensing process as uniquely patterning high-speed spatiotem- 
poral object voxels f{x,y,t) e with a transmission function that shifts in time (Fig.[T]i. Do- 
ing this applies distinct local coding structures to each temporal channel prior to integrating the 
channels as limited-framerate images g{x' ,y' ,t') e on the A^-pixel detector. An Nf -frame, 
high-speed estimate of f{x,y,t) may be reconstructed from each low-speed coded snapshot 
g{x',y',t'),v/i\h t' < t. 

Considering only one spatial dimension {{x,y) x) and respectively denoting object-and 
image-space coordinates with unprimed and primed variables, the sampled data g(;ic', ?') consists 
of discrete samples of the continuous transformation Q 

i-Nf rN fx-x'\ ft-t'\ 

g{x',t')^J J /(jc,f)r(x-5(f))rect ( j rect ( j c/xt/f, (1) 

where T{x — s{t)) represents the transmission function of the coded aperture, is the detector 
pixel size, rect(^) is the pixel sampling function and A, is the temporal integration time. s{t) 
describes the coded aperture's spatial position during the camera's integration window. 

One may analyze the expected temporal resolution of the coded data by considering the 
Fourier transform of Eq. ([TJ. Assuming the coded aperture moves linearly during A, such that 
s(f ) — Vt, the image's temporal spectrum is given by 



^(m, v) = sinc(MAx)sinc(vA() J f{u — w,v—vw)t{w)dw, (2) 
where f{u,v) is the 2D Fourier transform of the space-time datacube and T{w) is the ID 



Fourier transform of the spatial code. Without the use of the coded aperture, g{u,v) — 
sinc(MA;r)sinc(vA,)/(M, v) and the sampled data stream is proportional to the object video low- 
pass filtered by the pixel sampling functions. Achievable resolution is proportional to A^ in x 
and A, in time. The moving code aliases higher frequency components of the object video into 
the passband of the detector sampling functions. The support of T{w) extends to some multiple 
of the code feature size A^ (in units of detector pixels), meaning that the effective passband may 
be increased by a factor proportional to l/A^ in space and v/A^ in time. In practice, finite me- 
chanical deceleration times cause T{w) to have significant DC and low-frequency components 
in addition to the dominant v = |- ; hence, high and low frequencies alike are aliased into the 
system's passband. 



Detection Scheme 





Fig. 1. Detection process, (a) A discrete space-time source datacube is (b) multiplied at 
each of Np temporal channels with a shifted version of a coded aperture pattern, (c) Each 
detected frame g is the summation of the coded temporal channels and contains the object's 
spatiotemporal-multiplexed information. The dark grey (red-outlined) and black detected 
pixels in (c) pictorially depict the code's location at the beginning and the end of the cam- 
era's integration window, respectively. 

Considering a square A^-pixel active sensing area, the discretized form of the three- 
dimensional scene is f G K\/^x\/iVx«/r^ ^ (v^ x ^/N) x A^/r-voxel spatiotemporal datacube. 
In CACTI, a time-varying spatial transmission pattern T e K\/^xv^xAf/r mjiquely codes each 
of the Nf temporal channels of f prior to integrating them into one detector image g e M'^^ 
during Af. These measurements at spatial indices and temporal index k are given by 

Nf 

k=l 

where riij represents imaging noise at the {ij)'^^ pixel. One may rasterize the discrete object 
f e M.^^'' ^ ^ , image g e ' , and noise n £ M.^^ ' to obtain the linear transformation given by 

g = Hf + n, (4) 

where H e R^^^^f is the system's discrete forward matrix that accounts for sampling factors 
including the optical impulse response, pixel sampling function, and time-varying transmission 



function. The forward matrix is a 2-dimensional representation of the 3-dimensional transmis- 
sion function T: 



Hi = diag 



i,...,a^f; 



H 



def 



[Hi H2 • • • ^Np] , 



(5) 
(6) 



where H,;. € M"^'^ is a matrix containing the entries of T<. along its diagonal and H is a concate- 
nation of all H^., e { 1 , • • • ,Nf}- Fig.|2]underlines the role H plays in the linear transformation. 
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Fig. 2. Linear system model. Nf subframes of high-speed data f are estimated from a sin- 
gle snapshot g. The forward model matrix H has many more columns than rows and has 
dimensions A' x (N x Np). 
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Fig. 3. Waveform choices for s{t). Yellow: signal from function generator. Blue: actual 
hardware motion. Note the poor mechanical response to sharp rising/falling edges in (a) 
and (b). The sine wave (d) is unpreferable because of the nonuniform exposure time of 
different Tj.. 




At the k ' temporal channel, the coded aperture's transmission function T is given by 



(7) 



where Rand(OT,n,p) denotes a 50%, m x n random binary matrix shifted vertically by p pixels 
(optimal designs could be considered for this system as well), discretely approximates s{t) 
at the k'^^ temporal channel by 



Sk = CTri 



2Ar 



(8) 



where C, the system's compression ratio, is the amplitude traversed by the code in units of 

detector pixels. Tri 55; represents a discrete triangle wave signal of twice the integration time 

periodicity. The camera integrates during the C-pixel sweep of the coded aperture on the image 
plane and detects a linear combination of C uniquely-coded temporal channels of /. 



/'\/\ Real mask motion ^(0 (continuous) 
x''\y'\ Approximate masl< motion (discrete) 

Fig. 4. Continuous motion and discrete approximation to coded aperture movement during 
integration time. The discrete triangle function Sj^ more-accurately approximates the con- 
tinuous triangle wave driving the mask with smaller values of d but adds more columns to 
H. 




Fig. 5. Temporal channels used for the reconstruction. Red lines indicate which subset of 
transverse mask positions ij. were utilized to construct the forward matrix H. Blue lines 
represent the camera integration windows, (a) Calibrating with fewer s/^ results in a better- 
posed inverse problem but doesn't as closely approximate the temporal motion s{t). (b) 
With d = I, each pixel integrates several unique coding patterns with a temporal separation 
At = C^'. (c) Constructing H with large Nf (d < 1) interpolates the motion occurring 
between the uniquely-coded image frames. 



Periodic triangular motion lets us use the same discrete forward matrix H to reconstruct 
any given snapshot g within the acquired video while adhering to the hardware's mechanical 
acceleration limitations (Fig. [3]). The discrete motion si^ (Eq. ([8k) closely approximates the 
analog triangle waveform supplied by the function generator (FigTHli. 

Let d represent the number of detector pixels the mask moves between adjacent temporal 
channels and s^^+i. Np frames are reconstructed from a single coded snapshot given by 

Nf = ^, (9) 

thus, altering d will affect the number of reconstructed frames for given a compression ratio C. 

In the case of d = \ (i.e. Nf — C), the detector pixels that sense the continuous, temporally- 
modulated object / are critically-encoded; each pixel integrates a series of nondegenerate mask 
patterns (Fig.[5jb)) during A;. 

When d < \ (i.e. Nf > C), every ^ temporal channels of H will contain nondegenerate tem- 
poral code information. These channels will reconstruct as if the sensing pixels are critically- 
encoded. The other temporal slices will interpolate the motion between critically-encoded tem- 
poral channels (Fig.|5|c)). Generally, this interpolation accurately estimates the direction of the 
motion between these critically-encoded states but retains most of the residual motion blur. 

3. Experimental Hardware 

The experimental prototype camera (Fig. |6]l consists of a 50mm camera objective lens (Com- 
putar), a lithographically-patterned chrome-on-quartz coded aperture with anti-reflective coat- 
ing for visible wavelengths pO) (Eq. (|7|) mounted upon a piezoelectric stage (Newport Co.), 
an F/8 achromatic relay lens (Edmund Optics), and a 640 x 480 FireWire IEEE 1394a 
monochrome CCD camera (Marlin AVT). 



Objective lens coded aoerture 




Fig. 6. CACTI Prototype hardware setup. The coded aperture is 5.06mm x 4.91mm and 
spans 248 x 256 detector pixels. The function generator moves the coded aperture and 
triggers camera acquisition with signals from its function and SYNC outputs, respectively. 



The objective lens images the continuous scene / onto the piezo-positioned mask. The func- 



tion generator (Stanford Research Systems DS345) drives the piezo with a lOV pk-pk, 15Hz 
triangle wave to locally code the image plane while the camera integrates. We operate at this 
low frequency to accommodate the piezo's finite mechanical deceleration time. 

To ensure the CDMA process remains time-invariant, we use the function generator's SYNC 
output to generate a 15Hz square wave. We frequency-double this signal using an FPGA device 
(Altera) to trigger camera integrations once along the mask's upward and downward motion 
(Fig.|5). 

The relay lens images the spatiotemporally modulated scene onto the camera, which saves 
the 30fps coded snapshots to a local computer Nf video frames of the discrete scene f are 
later reconstructed from each coded image g offline by the Generalized Alternating Projection 
(GAP) (21) algorithm. 

During A;, the piezo can move a range of 0— 160/Zm vertically in the {x,y) plane. Using 
158.4/im of this stroke moves the coded aperture eight 19.8/iOT elements (sixteen 9.9/Zm de- 
tector pixels) during each camera integration period A(. Using larger strokes for a given modu- 
lation frequency is possible and would increase C. 

Spatial modulation Spatiotemporal modulation 




Temporal mask images diagonalized to form forward matrix H 




Fig. 7. Spatial and temporal modulation, (a) A stationary coded aperture spatially modu- 
lates the image, (b) Moving the coded aperture during the integration window applies local 
code structures to temporal channels, effectively shearing the coded space-time dat- 
acube and providing per-pixel flutter shutter, (c) Imaging the (stationary) mask at positions 
d pixels apart and storing them into the forward matrix H simulates the mask's motion, 
thereby conditioning the inversion. 

Importantly, using a piezoelectric stage itself is not the optimal solution to translate a coded 
aperture during A,. This device was preferable for the hardware prototype because of its preci- 
sion and convenient built-in Matlab interface. However, a low-resistance spring system could, 
in principle, serve the same purpose while using very little power. 



3.1. Forward Model Calibration 



We calibrate the system forward model by imaging the aperture code under uniform, white illu- 
mination at discrete spatial steps Sf; according to Eq. (j8]l. Steps are d detector pixels apart over 
the coded aperture's range of motion (Figs. [7|c)j5]). This accounts for system misalignments 
and relay-side aberrations. A Matlab routine controls the piezoelectric stage position during 
calibration. Since Matlab cannot generate a near-analog waveform for continuous motion, we 
connect the piezoelectric motion controller to the function generator via serial port during ex- 
perimental capture. 

We use an active area of 281 x 281 detector pixels to account for the 248 x 256-pixel coded 
aperture's motion .s^ with additional zero-padding. We choose d — to provide a substantial 
basis with which to construct the forward model while remaining well above the piezo's 0.0048 
pixel RMS jitter. Storing every temporal channel spaced d = 0.99/im apart into H results in 
Nf — 160 reconstructed frames. 

When reconstructing, we may diagonalize any subset of temporal slices of the 281 x 28 1 x 
160 set of mask images into the forward model (Fig. [7|l. We found the optimum subset of 
mask positions within this 160-frame set of Sf; through iterative | |g — Hfg | |2-error reconstruction 
tests, where fe is GAP's Nf -frame estimate of the continuous motion /. From these tests, we 
chose and compared two numbers of frames to reconstruct per measurement, Nf — C = 14 and 
Nf ~ 148. H has dimensions 281^ x (281^ x Nf) for both of these cases. 

As seen in Figs. T0p3 decreasing d and estimating up to 148 frames from a single exposure 



g does not significantly reduce the aesthetic quality of the inversion results, nor does it signif- 
icantly affect the residual error (Fig. |8jb)). The reconstruction time increases approximately 
linearly with Nf as shown in Fig.jHJa). 

4. Reconstruction Algorithm 

Since H multiplexes many local code patterns of the continuous object to the discrete-time im- 
age g, inverting Eq. Q for f becomes difficult as Nf increases. Least-squares, pseudoinverse, 
and other linear inversion methods cannot accurately reconstruct such underdetermined sys- 
tems. We use an iterative reconstruction algorithm called Generalized Alternating Projection 
(GAP) that exploits image and video models (priors) to effectively solve this ill-posed inverse 
problem fTP\. 

GAP takes advantage of the structural sparsity of the subframes in transform domains such 
as wavelets and discrete cosine transform (DCT). Fig.|9]illustrates the underlying principle of 
GAP. It is worth noting that GAP is based on the volume's global sparsity and requires no 
training whatsoever In other words, GAP is a universal reconstruction algorithm insensitive to 
data being inverted. 

The GAP algorithm makes use of Euclidean projections on two convex sets, which respec- 
tively enforce data fidelity and structural sparsity. Please refer to |21 1 for details. Furthermore, 
GAP is an anytime algorithm; the results produced by the algorithm converge monotonically to 
the true value as the computation proceeds. The monotonicity has been generally observed in 
our extensive experiments and theoretically established under a set of sufficient conditions on 
the forward model. The reconstructed subframes continually improve over successive iterations 
and the user can halt computation at anytime to obtain intermediate results. The user may then 
continue improving reconstruction by resuming the computation. The following briefly reviews 
the main steps of the GAP algorithm pTj . 



4.1. The Linear Manifold 

Data fidelity is ensured by projection onto the linear manifold 11 = f : Hi,j.kfij,k + "i.; = 

gij, V (/, j)}, which consists of all legitimate high-speed frames of f integrated onto the detector 



Reconstruction Ttme i/s Number nf Estimated Frames 




Fig. 8. Algorithm convergence times and relative residual reconstruction errors for vari- 
ous compression ratios, (a) GAP's reconstruction time increases linearly with data size. 
Tests were performed on a ASUS U46E laptop (Intel quad core 17 operated at 3.1GHz. 
(b) Normalized I2 reconstruction error vs. number of reconstructed frames. The residual 
error reaches a minimum at critical temporal sampling and gradually flattens out with finer 
temporal interpolation (lower d). 



via Eq. Q. In other words, 11 is a set of solutions to the underdetermined system of linear 
equations which are disambiguated by using structural sparsity. 

4.2. The Weighted £2.1 Ball 

Structural sparsity is encoded by G = {Gi ,G2, • • • , G,„}, a partition of the indices {{i,j,k) that 
span the voxels of f, and the associated weights j3 — {/3/ : Pi > 0,1 — 1,2, - ■ ■ ,m}. The weighted 

£2,1 ball is defined as A{R) — \\Q{f)\\^Gp < ^j-, where Q{f) is an orthonormal transform 
(wavelet transformation or Discrete Cosine Transformation (DCT)) of the volume f, 

i'j.k' 

with 21,22,63 orthonormal matrices, and 

m I 



Time Domain 




Fig. 9. Illustration of the GAP algorithm. 

is a weighted £2,1 norm of w. Note that A{R) is constructed as a weighted £2,1 ball in the space of 
transform coefficients w = Q{t), since structural sparsity is desired for the coefficients instead 
of voxels. The ball is rotated in the voxel space due to the orthonormal transform Q{-). 

4.3. Euclidean Projections 

The Euclidean projection of 9 onto Yl, 

Pnie)^ arg rnin ^ (0,,,- , - 



i.j,k 



is given by 



Hijk ( ^ 

fi,j,k = Qi.i.k + ^ gij - L ^iJ.k'HiJ,k 

2^k'=l"i,j.k' \ k'=l 



(10) 



The Euclidean projection of f onto A{R), 



Pmr) (f) = arg min ^ (0,j- , - 



can be equivalently written as 



/ 



^Aw(f) = e 



-1 



V 



Kap^'^i.j.k 

"■2,1 



J 



using that fact that Euclidean distance is invariant to the orthonormal transform Q{-). We are 
only interested in Pa[r) (f) when R takes the special values considered below. 



4.4. Alternating Projection between 11 and A{R) when R Systematically Changes 

The GAP algorithm is a sequence of EucHdean projections between a Hnear manifold and a 
weighted £2.1 ball that undergoes a systematic change in size. Starting from e'"' = 0, the GAP 
algorithm iterates between the following two steps, until ||f''' — S''' || converges in t. 

1) Projection on the linear manifold, 



with the solution given in Eq. ( 10 1. 
2) Projection on the weighted £2,1 ball of changing size, 

0''' = ^A(«('))(f*'^)> f>l> 

where 



R 




{h,h,' ■ ■ ,lm) ^s. a permutation of (1 , 2, • • • ,m) such that 



holds for any q < m — 1, and m* = min{z : cardinality(U^^[G/^^) > cardinality (g)}. The 
projection is given by 0''' ~ 2 H'^'''') where 



^tl = e(fW),,-,,max 1- V '"^' ^=^,0 , yiiJ,k)eGi. 



5. Results 



cacti's experimental temporal superresolution results are shown in Figs. T0p3 An eye blink 



ing, a lens falling in front of a hand, a chopper wheel with the letters 'DUKE' placed on the 
blades, and a bottle pouring water into a cup are captured at 30fps and reconstructed with 
Nf = C = 14 and Nf = 148. There is little aesthetic difference between these reconstructions. 
In some cases, as with the lens and hand reconstruction, A^^ = 148 appears to yield additional 
temporal information over A^f = 14. The upper-left images depict the sum of the reconstructed 
frames, showing the expected time-integrated snapshots acquired with a 30fps video camera 
lacking spatiotemporal image plane modulation. 

Note that several of these features, particularly the water pouring, are hardly visible among 
the moving code pattern. Objects exhibiting large temporal motion blur were reconstructed with 
GAP using DCT bases, while wavelet bases were used for the stationary reconstructions. 

The compression ratio C is 14 rather than 16 because the triangle wave's peak and trough (Tsi 
and Tsig) are not accurately characterized by linear motion due to the mechanical deceleration 
time and were hence not placed into H to reduce model error 

The CACTI system captures more unique coding projections of scene information when the 



mask moves C pixels during the exposure (Fig. 14 b,d)) than if mask is held stationary (Fig 
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High-speed reconstructed video frames (C= 14) 
14 frames reconstructed 




Fig. 10. High-speed (C = 14) video of an eye blink, from closed to open, reconstructed 
from a single coded snapshot for Nf = 14 and Nf = 148. The numbers on the bottom-right 
of the pictures represent the frame number of the video sequence. Note that the eye is the 
only part of the scene that moves. The top left frame shows the sum of these reconstructed 
frames, which approximates the motion captured by a 30fps camera without a coded aper- 
ture modulating the focal plane. 
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IHigli-speed reconstructed video frames (C = 14) 
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148 frames reconstructed 




Fig. 11. Capture and reconstruction of a lens falling in front of a hand for Nf = 14 and 
Nf = 148. Notice the reconstructed frames capture the magnification effects of the lens as 
it passes in front of the hand. 
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High-speed reconstructed video frames (C= 14) 
14 frames reconstructed (grid added to help visualize movement) 
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Fig. 12. Capture and reconstruction of a letter 'D' placed at the edge of a chopper wheel 
rotating at 15Hz for Nf = 14 and Nf = 148. The white part of the letter exhibits ghosting 
effects in the reconstructions due to ambiguities in the solution. The TwIST algorithm was 
used to reconstruct this data ^22 J . 
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14 frames reconstructed 




148 frames reconstructed 




Fig. 13. Capture and reconstructed video of a bottle pouring water for Nf = 14 and Nf = 
148. Note the time-varying specularities in the video. The TwIST algorithm was used to 
reconstruct this data f22]. 
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Coded exposure, reconstructed Image {still mask, frame 1 of 1) 




Coded exposure, reconstructed image {moving mask, frame 7 of 14) 




Coded exposure, reconstructed image {still mask, frame 1 of 1) 
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Fig. 14. Spatial resolution tests of (a,b) an ISO 12233 resolution target and (c,d) a soup 
can. These objects were kept stationary several feet away from the camera. (a,c) show 
reconstructed results without temporally moving the mask; (b,d) show the same objects 
when reconstructed with temporal mask motion. 
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-^Implemented mask, mean PSNR: 28.04dB 
-►-Simulated shifting mask, mean PSNR; 28.97dB 
-^Simulated random mask, mean PSNR: 28.46dB 
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Fig. 15. Simulated and actual reconstruction PSNR by frame, (a), (b), and (c) show PSNR 
by high-speed, reconstructed video frame for 14 eye blink frames, 14 fan frames, and 210 
fan frames, respectively, from snapshots g. Implemented mechanically-translated masks 
(red curves), simulated translating masks (black curves), and simulated LCoS coding (blue 
curves) were applied to high-speed ground truth data and reconstructed using GAP. 



[T4|a,c)), thereby improving the reconstruction quality for detailed scenes. The stationary binary 
coded aperture may completely block small features, rendering the reconstruction difficult and 
artifact-ridden. 

Completely changing the coding patterns C times during A, in hardware is only possible with 
adequate fill-factor employing a reflective LCoS device to address each pixel C times during the 
integration period. To compare the reconstruction fidelity of the low-bandwidth CACTI trans- 
mission function (Eq. (|7]i) with this modulation strategy, we present simulated PSNR values of 
videos in Fig. 15 a,b,c). For these simulations, reconstructed experimental frames at A^^ = 14 
were summed to emulate a time-integrated image. The high-speed reconstiTicted frames were 
used as ground truth. We reapply (1) the actual mask pattern; (2) a simulated CACTI mask 
moving with motion and (3) a simulated, re-randomized coding pattern to each high-speed 
frame used as ground truth. The reconstruction performance difference between translating the 
same code and re-randomized the code for each of the Np reconstructed frames is typically 
within IdB. 



6. Discussion and Conclusion 

CACTI presents a new framework to uniquely code and decompress high-speed video exploit- 
ing conventional sensors with limited bandwidth. This approach benefits from mechanical sim- 
plicity, large compression ratios, inexpensive scalability, and extensibility to other frameworks. 

We have demonstrated GAP, a new reconstruction algorithm that can use one of several 
bases to compactly represent a sparse signal. This fast-converging algorithm requires no prior 
knowledge of the target scene and will scale easily to lager image sizes and compression ratios. 
This algorithm was used for all reconstructions except Figs.[T2]and[T3] 

Despite GAP's computational efficiency, large-scale CACTI implementations will seek to 
minimize the data reconstructed in addition to that transmitted to sufficiently represent the 
optical datastream. Future work will adapt the compression ratio C such that the resulting re- 
constructed video requires the fewest number of computations to depict the motion of the scene 
with high quality. 

Since coded apertures are passive elements, extending the CACTI framework onto larger 
values of only requires use of a larger mask and a greater detector sensing area, making it a 
viable choice for large-scale compressive video implementations. As increases, LCoS-driven 
temporal compression strategies must modulate pixels C times per integration. Conversely, 
translating a passive transmissive element attains C times temporal resolution without utilizing 
any additional bandwidth relative to conventional low-framerate capture. 

Future large-scale imaging systems may employ CACTI's inexpensive coding strategy in 
conjunction with higher-dimensional imaging modalities, including spectral compressive video. 
Integrating CACTI with the current CASSI system |20j should provide preliminary reconstruc- 
tions depicting 4-dimensional datasets t{x,y,k,t). 



